In these exercises, you will use some of the functions from the tidyr package to make datasets tidy.

In several of our exercises (incl. this one), we will use data on global life expectancy from Gapminder and the Titanic dataset from Kaggle. In addition, for one of the exercises on tidy data, we will use an excerpt from NationMaster data on murder and intentional homicide for 2010.

First of all, copy, paste, and rund the following code in(to) your R script to load/create the datasets we will use in these exercises.

library(tidyverse)

gap_life <- read_csv("../data/gapminder/life_expectancy_years.csv")
titanic <- read_csv("../data/titanic/titanic.csv")

crime <- tibble(country = rep(c("Germany", "Brazil", "Norway"), 2),
                  crime = c(rep("murders", 3), rep("intentional homicide rate", 3)),
                  year = 2010,
                  value = c(690, 40974, 29, 0.84, 27, 0.68))

1

Have a look at the gap_life dataset. What do you notice?

2

Transform the gap_life dataset into a sensible long format.

You should gather the years into one column/variable. If you are unsure about the arguments of a function, you can always consult the help files by typing (and running) a ? directly followed by the function name (e.g., ?glimpse). NB: This only works if you have previously loaded the package that includes the function.

Be aware that there is one column in the dataset that you do not want to gather. You can specify this by adding -variable_name as the second argument (or first if you use pipes) to the gather() function.

3

Have a look at the crime dataset. What do you notice?

4

Tidy the crime dataset, so that there is only one observation/row for each country.
You should spread() the crime variable.

5

Split the Name variable in the titanic dataset into two variables: one that contains only the last name, and one that contains the first name(s) plus title (Mr., Mrs., Dr., etc.).
When looking at the Name variable in the titanic dataset you should notice that the last name is separated from the rest of the name by a comma plus a space.